Performance of Information Retrieval Models Using Term Co-occurrences
نویسندگان
چکیده
Many advanced models have been developed for information retrieval over the last years. These models are built on various artificial intelligence paradigms to improve the precision of the retrieval. Most of them exploit some form of term co-occurrences to improve retrieval quality. In this paper, we compare the retrieval performance of five of these models: the Extended Boolean model, the Generalized Vector Space model, the Frequent Set model, the Rough Set model and a Genetic-Based model. These models are tested on three sub-collections from TREC (Text REtrieval Conference). We analyze the specificity of the models regarding the form of co-occurrences introduced and report on the retrieval performance and the scalability of each model.
منابع مشابه
A New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملکاربست مدل بازیابی تخصص برای یافتن نویسندگان خبره
This research applied Expertise Retrieval model for finding expert authors, and used evaluation methods of Information Retrieval systems for measuring the performance of those models. Current research is an experimental one. Besides, a variety of methods including survey method has been used in the research process. Various models were developed for finding expert authors, all built on a known ...
متن کاملA Mathematical View of Latent Semantic Indexing: Tracing Term Co-occurrences
Current research in Latent Semantic Indexing (LSI) shows improvements in performance for a wide variety of information retrieval systems. We propose the development of a theoretical foundation for understanding the values produced in the reduced form of the term-term matrix. We assert that LSI’s use of higher orders of co-occurrence is a critical component of this study. In this work we present...
متن کاملFactors Affecting Student's Scientific Information Retrieval based on Fuzzy Logic Method Compared to Traditional Method
Background and aim: The aim of this study was to identify the factors affecting on students' performance in information retrieval based on fuzzy logic method compared to traditional method. Materials and methods: This survey-descriptive study was performed using quantitative approach. The research population was 34 PhD students, and the researcher-made questionnaire was used. Data were analyzed...
متن کاملOn the Concept of Correct Hits in Spoken Term Detection
In most Information Retrieval (IR) tasks the aim is to find human-comprehensible items of information in large archives. One such task is the spoken term detection (STD) one, where we look for userentered keywords in a large audio database. To evaluate the performance of a spoken term detection system we have to know the real occurrences of the keywords entered. Although there are standard auto...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007